EIRGRID17

Introduction

We worked with the df package from aimsir17. Named after the Irish name for weather, this package contains tidied data from the Irish Meteorological Service’s hourly observations for 2017 split into 15-minute intervals. In all, the data sets include observations from 25 weather stations as well as their locations. The data set details energy generation data for Ireland and Northern Ireland, energy demand data and wind generation data which was the primary focus of our study.

We will compare the patterns in the data of the variables. Comparisons will be made between NIGeneration and NIDemand. Similarly, IEGeneration and IEDemand will be examined to see what trends are present between these highly correlated variables. We are looking to see if there are any outliers in our data. Is Generation meeting demand? This will be important to determine if there are any blackouts or if there is surplus energy that could be exported. What influences demand trends across IE and NI? When is demand at its highest? How much of a role does Wind Generation play in meeting Total energy generation and moreover demand across both regions of the study? Do wind patterns change much across the year? We will also look at the relationship the variables have with the one dependent variable, SNSP. To do this, we will need to visualize our data. We will be using tidyverse and plotly to plot these and aid us in our analysis. Dplyr will be used as well as base R calculations to help summarize the data, which is important considering how it has been collected in 15-minute intervals.

Dataset information

glimpse(df)
## Rows: 35,040
## Columns: 15
## $ year               <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
## $ month              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ day                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ hour               <int> 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, …
## $ minute             <int> 0, 15, 30, 45, 0, 15, 30, 45, 0, 15, 30, 45, 0, 15,…
## $ date               <dttm> 2017-01-01 00:00:00, 2017-01-01 00:15:00, 2017-01-…
## $ NIGeneration       <dbl> 889.005, 922.234, 908.122, 918.802, 882.441, 848.86…
## $ NIDemand           <dbl> 775.931, 770.233, 761.186, 742.718, 749.238, 742.45…
## $ NIWindAvailability <dbl> 175.065, 182.866, 169.796, 167.501, 174.094, 189.92…
## $ NIWindGeneration   <dbl> 198.202, 207.765, 193.103, 190.757, 195.790, 212.95…
## $ IEGeneration       <dbl> 3288.57, 3282.12, 3224.27, 3171.27, 3190.28, 3184.6…
## $ IEDemand           <dbl> 2921.44, 2884.19, 2806.38, 2718.77, 2682.91, 2649.8…
## $ IEWindAvailability <dbl> 1064.79, 965.60, 915.35, 895.38, 1028.03, 1144.17, …
## $ IEWindGeneration   <dbl> 1044.72, 957.74, 900.46, 870.81, 998.31, 1119.12, 1…
## $ SNSP               <chr> "28.4%", "26.4%", "25.2%", "24.7%", "27.9%", "31.4%…

Data Wrangling

We had over 35000 rows and 17 columns. Upon initial viewing of the data, we saw that the first 6 columns alone were dealing with the timestamp of observations. We filtered out unnecessary columns such as Year, day, and date to reduce the number of columns from x to y. New columns added to the dataset included NIWIndPercent and NIWindDifference which shows the amount of wind-generated power that satisfies NI total demand and the amount of
We mutated the SNSP column to make it numeric. Certain values were strange such as certain values for wind generation exceeding wind availability.

Are there any missing values in the dataset?

# count of the missing values
sum(is.na(df))
## [1] 0

We mutated the SNSP column to make it numeric.

# As we can see that the SNSP col is in the string format, Hence for our calculation we need to convert it to numeric 
df$SNSP<-strsplit(df$SNSP, "%")
df$SNSP<-as.numeric(df$SNSP)/100

Box plot representation will show the overall idea of the dataset.

fig_box <- plot_ly(type = 'box')
fig_box <- fig_box |> add_boxplot(y = df$NIGeneration, name = "NIGeneration", boxpoints = 'suspectedoutliers',
                                   marker = list(color = 'rgb(7,40,89)',
                                                 outliercolor = 'rgba(7, 40, 89, 0.6)',
                                                 line = list(outliercolor = 'rgba(7, 40, 89, 1.0)',
                                                             outlierwidth = 2)),
                                   line = list(color = 'rgb(8,181,116)'))


fig_box <- fig_box |> add_boxplot(y = df$NIDemand, name = "NIDemand", boxpoints = FALSE,
                           marker = list(color = 'rgb(9,56,125)'),
                           line = list(color = 'rgb(9,56,125)'))
fig_box <- fig_box |> add_boxplot(y = df$NIWindAvailability, name = "NIWindAvailability", boxpoints = 'suspectedoutliers',
                           marker = list(color = 'rgb(8,81,156)',
                                         outliercolor = 'rgba(219, 64, 82, 0.6)',
                                         line = list(outliercolor = 'rgba(219, 64, 82, 1.0)',
                                                     outlierwidth = 2)),
                           line = list(color = 'rgb(8,81,156)'))
fig_box <- fig_box |> add_boxplot(y = df$NIWindGeneration, name = "NIWindGeneration", boxpoints = 'outliers',
                           marker = list(color = 'rgb(107,174,214)'),
                           line = list(color = 'rgb(107,174,214)'))

fig_box <- fig_box |> add_boxplot(y = df$IEGeneration, name = "IEGeneration", boxpoints = 'suspectedoutliers',
                           marker = list(color = 'rgb(8,81,056)',
                                         outliercolor = 'rgba(219, 64, 12, 0.6)',
                                         line = list(outliercolor = 'rgba(219, 64, 82, 1.0)',
                                                     outlierwidth = 2)),
                           line = list(color = 'rgb(8,81,156)'))


fig_box <- fig_box |> add_boxplot(y = df$IEDemand, name = "IEDemand", boxpoints = 'suspectedoutliers',
                           marker = list(color = 'rgb(100,81,056)',
                                         outliercolor = 'rgba(119, 64, 12, 0.6)',
                                         line = list(outliercolor = 'rgba(119, 64, 82, 1.0)',
                                                     outlierwidth = 2)),
                           line = list(color = 'rgb(8,181,156)'))



fig_box <- fig_box |> add_boxplot(y = df$IEWindAvailability, name = "IEWindAvailability", boxpoints = 'suspectedoutliers',
                           marker = list(color = 'rgb(140,181,056)',
                                         outliercolor = 'rgba(139, 24, 12, 0.6)',
                                         line = list(outliercolor = 'rgba(169, 64, 82, 1.0)',
                                                     outlierwidth = 2)),
                           line = list(color = 'rgb(8,181,156)'))


fig_box <- fig_box |> add_boxplot(y = df$IEWindGeneration, name = "IEWindGeneration", boxpoints = 'suspectedoutliers',
                           marker = list(color = 'rgb(140,101,156)',
                                         outliercolor = 'rgba(140, 104, 112, 0.6)',
                                         line = list(outliercolor = 'rgba(169, 64, 82, 1.0)',
                                                     outlierwidth = 2)),
                           line = list(color = 'rgb(8,181,156)'))

fig_box <- fig_box |> layout(title = "Box Plot For Each Variable")

fig_box

Figure 1: Boxplot for the main 8 columns which shows its distribution and outlier aspects

Analysis

  • NIGeneration and NIDemand

At first glance at NIDemand and NIgeneration, we need to know the best-performing months and worst-performing months above average.

max_above_avg <- df|>
  
      filter(NIGeneration>mean(NIGeneration))|>
      count(month)|>
      slice_max(order_by = n)


min_above_avg <- df|>
  filter(NIGeneration>mean(df$NIGeneration))|>
  count(month)|>
  slice_min(order_by = n)

mm_avg <- cbind(max_above_avg,min_above_avg)

print(mm_avg,row.names=FALSE)
##  month    n month    n
##      1 1829     5 1162

Here we got January and May; let’s compare these two graphs and check the correlation between them.

#Although we have added the caption for this figure our knitted file is not including this part, so we have mentioned in another way.
library(crosstalk)
tx <- SharedData$new(df)
gg <- ggplot(tx) + geom_line(aes(NIGeneration, NIDemand, group = month,color=month,alpha=0.5))

filt <- filter_select("id", "Select a month", tx, ~month)

bscols(
  filt,
  ggplotly(gg, dynamicTicks = TRUE),
  widths = 12
)
    Figure 2: Interactive graph does the comparison between months for NI Demand and Generation

After looking at the above graph, we may think, how many times does the NIDemand is getting fulfilled by the NIGeneration?

sd <- df|>
    mutate(diff=NIGeneration-NIDemand)|>
      filter(diff>0)|>
      count(NI_ff=diff>0)

qw <- df|>
  mutate(diff=NIGeneration-NIDemand)|>
  filter(diff<0)|>
  count(NI_uf=diff<0)
percent_calc <- (sd/(sd+qw))*100
cat("Demand is fullfilled by generation",sd[1,2],"times.","Demand is not fullfilled by generation",qw[1,2],"times" ," which means that around",percent_calc[1,2],"% of times the generation met demand")
## Demand is fullfilled by generation 19782 times. Demand is not fullfilled by generation 15258 times  which means that around 56.45548 % of times the generation met demand

Since the value is around 56%, Does the range of NI gen and NI Demand follow a similar pattern?

r1 <- summarise(df,range=max(NIGeneration)-min(NIGeneration))
r2 <- summarise(df,range=max(NIDemand)-min(NIDemand))  
print(cbind(r1,r2),row.names=FALSE)
##     range    range
##  1162.519 1170.843

Is it possible to know at what period of time the energy demand is the highest across the Year?

# Here We have calculated the threshold value by the range of NIDemand
threshold <- 1170

df|>
      filter(NIDemand>threshold)|>
      count(hour)|>
      slice_max(order_by=n,n=3)
##   hour   n
## 1   17 678
## 2   18 630
## 3   19 581

Let’s visualize the above findings with NIGeneration and NIDemand graph with respect to date.

library(plotly)
fig_plot <- plot_ly(df, x = ~date)
fig_plot <- fig_plot |> add_lines(y = ~NIGeneration, name = "NIGeneration")
fig_plot <- fig_plot |> add_lines(y = ~NIDemand, name = "NIDemand", visible = F,alpha=0.8)
fig_plot <- fig_plot |> add_lines(y = ~IEGeneration, name = "IEGeneration", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~IEDemand, name = "IEDemand", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~NIWindAvailability, name = "NIWindAvailability", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~NIWindGeneration, name = "NIWindGeneration", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~IEWindAvailability, name = "IEWindAvailability", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~IEWindGeneration, name = "IEWindGeneration", visible = F)
fig_plot <- fig_plot |> layout(
  xaxis = list(
    domain = c(0.1, 1),
    rangeselector = list(
      buttons = list(
        list(
          count = 3,
          label = "3 mo",
          step = "month",
          stepmode = "backward"),
        list(
          count = 6,
          label = "6 mo",
          step = "month",
          stepmode = "backward"),
        list(
          count = 1,
          label = "1 yr",
          step = "year",
          stepmode = "backward"),
        list(
          count = 1,
          label = "YTD",
          step = "year",
          stepmode = "todate"),
        list(step = "all"))),
    
    rangeslider = list(type = "date")),
  
  yaxis = list(title = "y"),
  updatemenus = list(
    list(
      y = 0.7,
      buttons = list(
        list(method = "restyle",
             args = list("visible", list(TRUE, FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
             label = "NIGeneration"),
        list(method = "restyle",
             args = list("visible", list(TRUE, TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
             label = "NIGenerationvsNIDemand"),
        list(method = "restyle",
             args = list("visible", list(FALSE, TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
             label = "NIDemand"),
        list(method = "restyle",
             args = list("visible", list(FALSE,FALSE, TRUE,FALSE,FALSE,FALSE,FALSE,FALSE)),
             label = "IEGeneration"),
        list(method = "restyle",
             args = list("visible", list(FALSE,FALSE,FALSE, TRUE,FALSE,FALSE,FALSE,FALSE)),
             label = "IEDemand"),
        list(method = "restyle",
             args = list("visible", list(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE)),
             label = "NIWindAvailability"),
        list(method = "restyle",
             args = list("visible", list(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE)),
             label = "NIWindGeneration"),
        list(method = "restyle",
             args = list("visible", list(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)),
             label = "IEWindAvailability"),
        list(method = "restyle",
             args = list("visible", list(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE)),
             label = "IEWindGeneration")
      ))
  )
)

fig_plot

Figure 3: Visual representation of all parameters with respect to date.

  • IEDemand and IEGeneration

Does the IEGeneration fullfils the IEDemand in the year 2017? What is the mean/median of IE generation and demand on any given month in 2017?

sd_IE <- df|>
    mutate(diff=IEGeneration-IEDemand)|>
      filter(diff>0)|>
      count(NI_ff=diff>0)

qw_IE <- df|>
  mutate(diff=IEGeneration-IEDemand)|>
  filter(diff<0)|>
  count(NI_uf=diff<0)
percent_calc_IE <- (sd_IE/(sd_IE+qw_IE))*100
cat(percent_calc_IE[1,2],"% of times the generation met demand for republic of  ireland")
## 62.1165 % of times the generation met demand for republic of  ireland
mean_IE <- df |> 
group_by(month) |>
summarize(mean = mean(IEGeneration))|>
slice_max(mean)  

median_IE <-dem <- df |> 
group_by(month) |>
summarize(median = median(IEDemand))|>
slice_max(median)  

cbind(mean_IE,median_IE)
##   month     mean month  median
## 1     1 3618.783    11 3568.14

Correlation of IEGeneration with IEDemand

As we can see from the plot the variance and correlation between IEDemand and IEGeneration is decreasing in the first half of the year and augmented in the later part.

fig_an <- df %>%
  plot_ly(
    x = ~df$IEDemand,
    y = ~df$IEGeneration,
    frame = ~df$month,
    type = 'scatter',
    mode = 'markers',
    color = ~df$month,
    showlegend = F
  )

fig_an

Figure 4: IE demand Vs Generation month wise display

  • NIwindavailability and NIwindgeneration

General wind availability and wind generation patterns for Northern Ireland.

NImeanWind <- mean(df$NIWindAvailability)
NImeanGen <- mean(df$NIWindGeneration)
NImeanRatio <- NImeanGen/NImeanWind
cat("NI Mean Wind Availability = ",NImeanWind,"\n","NI Mean Wind Generation = ", NImeanGen,"\n","Mean Proportion of Wind Availability Captured = ", NImeanRatio)
## NI Mean Wind Availability =  251.0544 
##  NI Mean Wind Generation =  232.1795 
##  Mean Proportion of Wind Availability Captured =  0.9248176

How much of NI Demand is satisfied by Wind Generation?

df$NIwindPercent <- df$NIWindGeneration/df$NIDemand
medianNIwindPercent <- median(df$NIwindPercent)
sdNIwindPercent <- sd(df$NIwindPercent)
meanNIwindPercent <- mean(df$NIwindPercent)

cat("The mean proportion of 2017 NI energy demand satisfied by wind generation  = ", meanNIwindPercent,"\n")
## The mean proportion of 2017 NI energy demand satisfied by wind generation  =  0.2669297
  • IEWindAvailability and IEWindGeneration

Which month has the maximum and minimum Wind Generation and Availability in the whole year?

zx <- df |> group_by(month) |> summarise(IEMonthlyWindAvailability = sum(IEWindAvailability))|> slice_max(IEMonthlyWindAvailability)

xz <- df |> group_by(month) |> summarise(IEMonthlyWindGeneration = sum(IEWindGeneration))|> slice_max(IEMonthlyWindGeneration)


po <- df |> group_by(month) |> summarise(IEMonthlyWindGeneration = sum(IEWindGeneration, na.rm=TRUE))|> slice_min(IEMonthlyWindGeneration)

op <- df |> group_by(month) |> summarise(IEMonthlyWindAvailability = sum(IEWindAvailability, na.rm=TRUE))|> slice_min(IEMonthlyWindAvailability)


print(rbind(cbind(zx,xz),cbind(po,op)),row.names=FALSE)
##  month IEMonthlyWindAvailability month IEMonthlyWindGeneration
##     12                   3452704     2                 3275386
##      7                   1756577     7                 1624302

What is the contribution of Wind Generated Energy to Overall Generated Energy?

df <- mutate(df, contr = (df$IEWindGeneration/df$IEGeneration)*100)
mean(df$contr)
## [1] 24.89941
bn <- df |> group_by(month) |> summarise(Monthlycontr = mean(contr, na.rm=TRUE))|> slice_min(Monthlycontr)

nb <- df |> group_by(month) |> summarise(Monthlycontr = mean(contr, na.rm=TRUE))|> slice_max(Monthlycontr)

print(cbind(bn,nb),row.names=FALSE)
##  month Monthlycontr month Monthlycontr
##      7     18.03937     2     33.37084

Are they able to use the overall wind availability for generating energy?

df <- mutate(df, IE_diff = (df$IEWindAvailability - df$IEWindGeneration))
sum(df$IE_diff)
## [1] 2223307
yt <- df |> group_by(month) |> summarise(monthlyIE_diff = sum(IE_diff))|> slice_max(monthlyIE_diff)

ty <- df |> group_by(month) |> summarise(monthlyIE_diff = sum(IE_diff))|> slice_min(monthlyIE_diff)

print(cbind(yt,ty),row.names=FALSE)
##  month monthlyIE_diff month monthlyIE_diff
##     10       387466.8     1       107605.1
Comparison

These histograms demonstrate the pattern for wind availability and wind generation across Ireland.

library(plotly)
fig_hist1 <- plot_ly(
  type='histogram',
  x=df$NIWindAvailability,
  name = "NIWindAvailability",
  bingroup=1)

fig_hist1 <- fig_hist1 |> add_trace(
  type='histogram',
  x=df$NIWindGeneration,
  name = "NIWindGeneration",
  bingroup=1,
  alpha=0.5)

fig_hist1 <- fig_hist1 |> layout(
  barmode="overlay",
  bargap=0.1)

fig_hist2 <- plot_ly(
  type='histogram',
  x=df$IEWindAvailability,
  name = "IEWindAvailability",
  bingroup=1)

fig_hist2 <- fig_hist2 |> add_trace(
  type='histogram',
  x=df$IEWindGeneration,
  name = "IEWindGeneration",
  bingroup=1,
  alpha=0.4)

fig_hist2 <- fig_hist2 |> layout(
  barmode="overlay",
  bargap=0.1)

subplot(fig_hist1,fig_hist2)

Figure 5: Histogram plots explaining Wind parameter aspects between the regions

Are there Relationships between two quantitative variables With respect to SNSP?

Here the X-axis is (NIDemand, NIDemand), the y-axis (NIGeneration, IEGeneration) and the Z-axis is SNSP

fig_3d_1 <- plot_ly(x = df$IEGeneration, y = df$IEDemand, z = df$SNSP, scene='scene') |>
  add_markers(color = df$month)

fig_3d_2 <- plot_ly(x = df$NIGeneration, y = df$NIDemand, z = df$SNSP, scene='scene2') |>
  add_markers(color = df$month)

fig_3d_f <- subplot(fig_3d_1,fig_3d_2)

fig_3d_f <- fig_3d_f %>% layout(title = "3D Subplots",
         scene = list( 
                      aspectmode='cube'),
         scene2 = list(
                       aspectmode='cube'))
fig_3d_f

Figure 6: 3D visualization of demand and generation parameters from NI and IE

Facet wrap for NI and IE

fc1 <- ggplot(data = df ) + 
  geom_point(mapping = aes(x = NIGeneration, y =  NIDemand, color=month)) + 
  facet_wrap(vars(month))

fc2 <- ggplot(data = df ) +
  geom_point(mapping = aes(x = IEGeneration, y =  IEDemand, color=month)) +
  facet_wrap(vars(month))

fc1
Figure 7: Facet wrap plot showing generation and demand between the regions

Figure 7: Facet wrap plot showing generation and demand between the regions

fc2
Figure 7: Facet wrap plot showing generation and demand between the regions

Figure 7: Facet wrap plot showing generation and demand between the regions

Conclusion

  • Above average performance for NI generation tops in the month of January and least in the month of May. NI demand and generation values range on an average of 1160-1170. IE generation follows a symmetric pattern and IE demand follows a left-skewed pattern, the maximum IE generation and IE demand range from 3568 - 3618.

  • The columns which have outliers in them are NIGeneration, NIWindGeneration, and IEGeneration, they all constitute less than 1% so there is no need to treat them.

  • Around 56% of the time generation fulfills the demand for Northern Ireland. Around 62% of the Republic of Ireland’s generation fulfills the demand which is 6% more than that of Northern Ireland.

  • Maximum demand in Ireland occurs in the month of December, and Maximum demand across Northern Ireland occurs in the month of January.

  • As we can see in the above plots and analyses demand is usually higher in the winter months and low in the summer months. This can be influenced by two major factors, the First being the heating bills but here they do not purely rely on electric heating as oil heating is also used. Second is Daylight Savings as it gets dark very early so the lightning bill can be fluxed as in summer you will have daylight till 8-9 pm but in winter it’s only till 4-5 pm.

  • Around 92% of Wind Availability is converted into Wind Generation for Northern Ireland. 26% of the overall energy generated in Northern Ireland is from Wind Generation. For the Republic of Ireland, this percentage drops to 24%, but as we dig deep we found that for the month of February the share of wind generation in overall generation is approx 33% i.e. 1/3rd of the energy is generated using wind only. Also, for the Republic of Ireland, there are 2223307 residual units of Wind Availability in the whole year.

  • As we can see in the above-described plot Wind Availability and Wind Generation for both Northern Ireland and the Republic of Ireland overlap most of the time, which means that most of the Available wind is getting used for generation.

  • After comparing Demand and Generation w.r.t SNSP for both regions, we found that in the Republic of Ireland as the values are high, there is a linear trend associated with SNSP. As the values in Northern Ireland are less, there we see a loosely packed trend therefore we can infer that there is more variance while associating it with SNSP.

Credits

I, Abhishek Prasad Teni, had the responsibility to analyze NIGeneration and NIDemand parameters. Moreover, I plotted graphs for the entire report and was also assigned data wrangling and merging the data into the report.

I, Sanidhya vishnoi, had the primary responsibility to scrutinize the IEWindGeneration and IEWindAvailability part of the data. Moreover, I was assigned to write the conclusion and finalize the code structure.

I, Alan Rigney, had primary responsibility for the material in NIWindGeneration and NIWindAvailability. It was my aim to discover the patterns in these two columns, how these contributed to overall demand in NI and to investigate surplus wind availability in the country. I was also tasked with presenting the data to the class.

I, Giridharan Sridharan, picked the columns - IEGeneration & IEDemandand from the Eirgrid17 dataset and was responsible for analyzing these columns and coming up with my findings. Moreover, I was tasked with writing the analysis for plots.